Extracts BitString from code point to BitString class by mward-sudo · Pull Request #723 · bartblast/hologram

mward-sudo · 2026-02-19T01:50:43Z

Closes #720

Dependencies

Please note that this PR includes commits from the PR(s) it is dependent upon. Once the dependent PR(s) are merged to the dev branch, then this PR will be rebased and will then only contain its own commits. This PR will remain in draft until that point.

Summary by CodeRabbit

New Features
- Added UTF-8 character encoding and decoding capabilities with comprehensive validation and handling.
Refactor
- Streamlined UTF-8 processing with centralized utility functions, improving code maintainability and consistency.
Tests
- Added extensive test coverage for UTF-8 operations, character encoding, decoding, and validation scenarios.

…ds parameter validation

coderabbitai · 2026-02-19T01:51:03Z

📝 Walkthrough

Walkthrough

Added UTF-8 utilities to the Bitstring class for decoding/encoding code points and validating UTF-8 sequences. Refactored unicode.mjs to use these new Bitstring helpers instead of custom UTF-8 validation and codepoint extraction logic. Added comprehensive test coverage for the new utilities.

Changes

Cohort / File(s)	Summary
UTF-8 Utilities in Bitstring `assets/js/bitstring.mjs`	Added 8 new static UTF-8 helper methods: decodeUtf8CodePoint, fromCodepoint, toCodepointArray, getValidUtf8Length, isValidUtf8CodePoint, isValidUtf8ContinuationByte, isValidUtf8Sequence, isTruncatedUtf8Sequence. These centralize UTF-8 encoding/decoding and validation logic for use across the codebase.
Unicode Module Refactoring `assets/js/erlang/unicode.mjs`	Replaced low-level bitstring construction and custom UTF-8 validation with calls to new Bitstring utilities. Updated characters_to_binary/3, handleInvalidUtf8FromBinary, handleInvalidUtf8FromList, and other functions to use Bitstring.fromCodepoint, Bitstring.toCodepointArray, Bitstring.getValidUtf8Length, and Bitstring.isTruncatedUtf8Sequence instead of manual implementations.
UTF-8 Utilities Test Coverage `test/javascript/bitstring_test.mjs`	Added comprehensive test suite for new UTF-8 utilities covering decodeUtf8CodePoint, fromCodepoint, toCodepointArray, getValidUtf8Length, isTruncatedUtf8Sequence, isValidUtf8CodePoint, isValidUtf8ContinuationByte, and isValidUtf8Sequence with various edge cases and byte sequences.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Port :unicode.characters_to_nfkd_binary/1 to JS #578: Replaces UTF-8 validation and sequence functions (findValidUtf8Length, isValidSequence) used in characters_to_nfkd_binary with centralized Bitstring utilities.
Extract UTF-8 sequence length detection to Bitstring class #706: Also modifies Bitstring UTF-8 utilities (getUtf8SequenceLength) and refactors erlang/unicode.mjs to use Bitstring UTF-8 helpers.
Port :unicode.characters_to_nfc_list/1 to JS #580: Similarly modifies UTF-8 and codepoint handling in erlang/unicode.mjs for characters_to_nfc_list/1 implementation.

Suggested reviewers

bartblast

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main objective of moving BitString code point conversion utilities into the BitString class, matching the core intent of issue `#720`.
Linked Issues check	✅ Passed	The PR successfully implements the requirement from issue `#720` by adding toCodepointArray(bitstring) as a static method and providing comprehensive UTF-8 conversion utilities in the BitString class.
Out of Scope Changes check	✅ Passed	All changes are directly related to extracting BitString/code point utilities into the BitString class. The refactoring in unicode.mjs uses the new utilities as intended, with no extraneous modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

assets/js/bitstring.mjs (1)

259-259: Allocating lookup-table objects per call in hot-path methods.

Both decodeUtf8CodePoint (line 259) and isValidUtf8CodePoint (line 630) recreate a small object literal on every invocation. These are called in the O(n) scan inside getValidUtf8Length, once per multi-byte sequence.

The firstByteMasks values follow the pattern 0x7f >> length, which avoids the allocation entirely, and minValueForLength can be a module-level constant array.

♻️ Zero-allocation alternatives

  static decodeUtf8CodePoint(bytes, start, length) {
    if (length === 1) return bytes[start];

-   // First byte masks: 2-byte=0x1f, 3-byte=0x0f, 4-byte=0x07
-   const firstByteMasks = {2: 0x1f, 3: 0x0f, 4: 0x07};
-
-   let codePoint = bytes[start] & firstByteMasks[length];
+   // First byte masks: 2→0x1f, 3→0x0f, 4→0x07 (formula: 0x7f >> length)
+   let codePoint = bytes[start] & (0x7f >> length);

    for (let i = 1; i < length; i++) {

For isValidUtf8CodePoint, hoist minValueForLength to a module-level (or class-level) constant:

+// Module-level constant; indices 0–4, only 1–4 used.
+const UTF8_MIN_CODE_POINT = [0, 0, 0x80, 0x800, 0x10000];

  static isValidUtf8CodePoint(codePoint, encodingLength) {
-   const minValueForLength = {1: 0, 2: 0x80, 3: 0x800, 4: 0x10000};
-   if (codePoint < minValueForLength[encodingLength]) return false;
+   if (codePoint < UTF8_MIN_CODE_POINT[encodingLength]) return false;

Also applies to: 630-630

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@assets/js/bitstring.mjs` at line 259, The code repeatedly allocates small
lookup objects inside hot-path functions decodeUtf8CodePoint and
isValidUtf8CodePoint (e.g. firstByteMasks and minValueForLength); hoist these to
module-level constants and replace the firstByteMasks literal with the computed
pattern (use 0x7f >> length) or a precomputed array to avoid per-call object
creation, and make minValueForLength a shared constant array used by both
functions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@assets/js/bitstring.mjs`:
- Line 259: The code repeatedly allocates small lookup objects inside hot-path
functions decodeUtf8CodePoint and isValidUtf8CodePoint (e.g. firstByteMasks and
minValueForLength); hoist these to module-level constants and replace the
firstByteMasks literal with the computed pattern (use 0x7f >> length) or a
precomputed array to avoid per-call object creation, and make minValueForLength
a shared constant array used by both functions.

mward-sudo added 8 commits February 18, 2026 22:39

Extracts UTF-8 code point decoding to the BitString utility class, ad…

9a49bec

…ds parameter validation

Extracts UTF-8 continuation byte validation to BitString class

105d2d9

Extracts Utf-8 code point validation to BitString class

7015625

Extracts UTF-8 sequence validation to BitString class

b9bfe2d

Extracts truncated UTF-8 sequence validation to BitString class

9ab7bc3

Extracts valid UTF-8 sequence length to BitString class

f458bf4

Extracts BitString to code points array to BitString class

8d3af8d

Extracts BitString from code point to BitString class

efef2de

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Extracts BitString from code point to BitString class#723

Extracts BitString from code point to BitString class#723
mward-sudo wants to merge 8 commits intobartblast:devfrom
mward-sudo:02-19-extracts_bitstring_from_code_point_to_bitstring_class

mward-sudo commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

mward-sudo commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependencies

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mward-sudo commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading